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Established idea-sets may not update seamlessly. The tension between new and old views of na- 
ture is documented in Galileo's dialogs and now present in many fields. One evolutionary response 
may be to consider the simplicity of paths from various starting points to one goal. We illustrate 
with a look at two simplifications: The move from Lorentz-transform to metric-equation de- 
scriptions of space-time, and the move from classical to statistical thermodynamics with help 
from Boltzmann's choice-multiplicity & Shannon's uncertainty. Connections of the latter to corre- 
lation measures behind available work, model selection, and layered complexity are also explored. 
New strategies are exemplified with Appendices on: anyspeed vector-velocity addition, the energy- 
momentum half-plane lost to finite lightspeed, the modern distinction between proper and geometric 
accelerations, single map-frame views of anyspeed acceleration, quantifying risk with a handful of 
coins, available work in bits, quantitative model-selection, and the evolution of analog/digital com- 
plexity. 
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Evolution of well-worn approaches naturally encoun- 
ters resistance from experts in the oldi. For instance 
Martin Gardner in his book on parity inversion^ cites 
Hermann Kolbe's negative reaction to the prediction 
of carbon's tetrahedral nature by Jacobus van't Hoff 
(Chemistry's first Nobel Laureate). The hullabaloc 3 
about the Nowak et al. paper— on models for evolving 
insect social behavior is a more recent research example, 
while participants in the content-modernization branch 
of physics education research (PER) have engaging tales 
on the education side 5 . For text publishers, however, 
even funerals may not mark progress since choosers of 
a course text might understandably like to teach that 
course the way they learned it, whether they own the 
strategy or not. 

One way to objectively assess new approaches is 
perhaps to examine the algorithmically-shortest path 
to quantitative insight from each given starting point. 
For experts in the old, traditional approaches may be 
algorithmically-shortest even if they are not shortest for 
newcomers to a given subject. Differing perceptions, in 
this context, might thus be put onto a rational footing. 
In this context here, we examine textbook trends to- 
ward "metric-first" approaches to relativistic motion, and 
"entropy-first" approaches to statistical inference about 
physical systems, in hopes of helping individual teachers 
chart their own path through the evolving terrain. 



II. PRINCIPLES 

From a given starting point, the strategy for putting 
together a concept map may be to minimize the number 
of: (i) assumptions and (ii) new concepts needed to make 
a given set of quantitative predictions possible. Drawings 
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FIG. 1: Proper-kinematics from the metric-equation first. 

of such paths from various starting points, in this context, 
might inspire one to evolve one's own starting point in 
teaching a given class over time. 

Note that we are weighing self-consistent approaches 
for their compactness, portability, and appropriateness 
in much the same way that different variable-changes in 
calculus, and coordinate-system choices in analytic ge- 
ometry, buy more advantage for some tasks and less ad- 
vantage for others. In that sense, we seek to apply the 
science of Bayesian model- selection to the evolution of 
what we teach. 

Below we illustrate this with a few examples, based on 
content changes already underway in the evolving physics 
curriculum. Similar charts for your own approach to a 
given class, as it relates to the textbook in hand as well 
as the larger picture, may be worth putting together for 
sharing with your students and perhaps in electronic col- 
laboration spaces with the larger physics teaching com- 
munity as well. 



III. METRIC-BASED MOTION 

The traditional path via Lorentz transforms, by using 
two separate coordinate- frames with their own yardsticks 
and synchronized clocks, likely provides the most direct 
route to length contraction. On the other hand shorter 
paths to time dilation 6 , accelerated motion 7 , and 
gravitation 8,9 follow by invoking the flat-space metric 
equation to a single map-frame of yardsticks & synchro- 



nized clocks, as simply an equation for time-elapsed on 
the clocks of a traveler. 

The shift may seem uninteresting if one has mastered 
space-time through Lorentz transforms, and sees relativ- 
ity as an extension of Newtonian physics to extreme situ- 
ations. On the other hand if one has been exposed mainly 
to single map- frame calculations and/or sees relativity as 
an everyday cause of magnetism & gravity, then the two- 
frame approach serve up un-needed complication. 

The traditional approach for instance: (i) emphasizes 
symmetry between frames even when the home-frame e.g. 
of a traveling clock or yardstick is quite special, (ii) raises 
the dissonant spectre^— of relativistic mass, (hi) avoids 
use of proper- acceleratio n 14 ' 15 as an integrative com- 
plement to geometric-accelerations (affme-connection ef- 
fects) at low and high speeds, and (iv) misses out on in- 
sights that proper- velocit y 16 ! 17 offers e.g. into relativistic 
velocity-addition fAppendix IA 1[) and the lightspeed limit 
( Appendix I A 2[) . The single map- frame approach avoids 
these problems, with the result that elements of it are 
finding their way into texts on all levels. 

In context of this evolving strategy we recommend 
drawing concept-interconnection maps (like those in the 
sections below) showing the steps from what you, and 
what separately your students, understand to what you'd 
like them to master in a given course. Intro-physics 
and relativity texts over the past couple of decades will 
give you an idea about how the Lorentz-transform ap- 
proach is already evolving to consider privileged frames 
e.g. like that of the traveling clock in time-dilation analy- 
sis and the traveling yardstick in length-contraction anal- 
ysis. Because it doesn't yet appear in texts, however, let's 
show briefly how a metric-first approach can build signif- 
icant insight from what intro-physics students already 
know. 

Begin with Minkowski's version of Pythagoras theorem 
i.e. the flat-space metric equation shown in the top of 
Figure [T] This introduces time-elapsed on the traveler's 
clock i.e. proper-time r, and hence two new rate-of- 
travel parameters: namely the 3- vector proper- velocity 
w = dx/dt and Lorentz- factor 7 = dt/dr in terms of 
already- familiar coordinate- velocity v = dx/dt. 

For anyspeed work, proper-velocity instead of 
coordinate-velocity: (i) equals momentum per unit mass, 
(ii) adds vectorially (with an out-of-frame rescale) as 
shown in Appendix IA 11 and (iii) with no upper-limit 
most elegantly (at 1 [ly/ty]) parameterizes the transi- 
tion from sub to hyper relativistic. Moreover this metric- 
first analysis can be extended to treat constant proper- 
acceleration with equations that in the unidirectional 
case are particularly simple, and connected to the low- 
speed equations for constant coordinate-acceleration. 

The big caveat here is that simultaneity is defined en- 
tirely by the single reference map-frame. Robust quan- 
titative insight into the meaning of that, as well as 
of length-contraction, likely will have to await Lorentz 
transforms. 

As teachers we should probably choose a path through 
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space-time for students which draws strength from our 
prior training and acquired insight into both paths, as 
well as the path's connection to the past and future of 
students in each given course. For instance, with intro- 
ductory physics students it's quite easy to tell students 
(even if the book doesn't) that time passes differently 
on different clocks so that, unless otherwise stated, time 
will be measured on a set of synchronized "map-clocks" 
affixed to the yardsticks used to measure position. Even 
better if we can also give them an updated view of forces 
in non-rain frames ( Appendix IB 1|) . and of constant ac- 
celeration at any speed (Appendix IB 2|) . 

IV. MULTIPLICITY-BASED 
THERMODYNAMICS 

The question here is: Do I start by introducing temper- 
ature in historical units and the zeroeth law while saving 
entropy to the end, or do I start with choice-multiplicity 
and entropy so that the assumptions behind the ideal gas 
law, equipartition, and mass action are explicit from day 
one? Senior undergraduate texts almost all now do the 
latter, while only a small number of introductory texts 
have made the switch so far. 

The simplest axiomatic path to: (a) the ideal gas equa- 
tion, (b) equipartition, and (c) the law of mass action is 
likely Boltzmann's choice-multiplicity 

H '=nQ-) • « 

from which entropy S = feln[W] and its derivatives may 
be defined. This choice-multiplicity, of course, is just the 
dimensionless count which underlies the familiar use of 
both information units and Joules per degree Kelvin[12]. 

Historical approaches introduce these consequences as 
empirical and/or informally-useful relationships, with- 
out clear definition of their underlying mechanism 
and assumptions and typically with discussions of en- 
tropy/multiplicity (the horse) following these conse- 
quences (the cart). Such approaches do not provide in- 
sight into: (i) the quantitative limitations of these con- 
cepts, or (ii) strategies for moving beyond those limi- 
tations e.g to systems in which subsystem correlations 
cannot be ignored. 

Another reason to introduce multiplicity first is that 
the laws of thermodynamics (short of two physical pos- 
tulates) follow therefrom as well. The zeroth law follows 
from the fact that the largest number of states is available 
when the uncertainty slopes of two subsystems (recipro- 
cal temperatures for the energy observable) equilibrate 
as a conserved quantity is shared. 

The first law, oft described as a statement of energy 
conservation, in fact arises from maximum entropy in- 
ference as a relation between ordered and disordered 
changes in any observable, whether they are conserved 
on transfer between subsystems or not. Likewise for the 



second law, whose physics actually comes not from sta- 
tistical inference but from the assumption that mutual 
information available on the state of an isolated system 
will not increase over time. 

Finally the very definition of reciprocal temperature as 
an uncertainty slope will convince many that the change 
in state-uncertainty about any finite system, per unit 
change in energy, is likely to be finite. Hence reciprocal- 
temperature's infinity (the absolute-zero of temperature) 
is likely inaccessible. This natural definition of tempera- 
ture has the added advantage that it prohibits one from 
approaching absolute-zero from negative or positive di- 
rections, and shows that the negative absolute-zero ap- 
proachable e.g. by spin systems with a population inver- 
sion is as far away from positive absolute-zero as you can 
get. 

Examples of the power in this recasting of famil- 
iar rules include many senior undergraduate thermal 
physics texts, like those by Kittel & Kroemer-^, Dan 
Schroeder— , and Claude Garrod2£ (who refers to recip- 
rocal temperature^ as coldness), Tom Moore & Dan 
Schroeder's AJP paper 22 , Tom Moore's introductory 
physics Unit T— , etc. 

V. COLLATERAL CONNECTIONS 

We've now covered two paradigm-shifts that have a 
well-defined place in the physics curriculum. The ap- 
proach taken with respect to them in a given class should 
inform itself to both teacher & student backgrounds, as 
well as to course objectives. The second paradigm-shift 
makes contact with other developments of interest to 
physics students as well. 

To explore this we step back from uncertainties to 
probability measures, and then forward from uncertain- 
ties to correlation measures, to show how the second 
simplification also allows physics to make contact with a 
number of other lively disciplines. Because of the physics 
in between, out-of-discipline students may never hear 
about these connections if they aren't at least mentioned 
in one of their physics classes. 

A. surprisals 

Recall that information units can be introduced by the 
statement that # choices equals 2# blts . Also very small 
probabilities p can be put into everyday terms as the 
surprisal 23 s — n bits of tossing n coins all heads up 
since p — l/2# blts , with the added advantage that sur- 
prisals add whenever their probabilities multiply (Ap- 
pendix IC II) . Evidence in bits^ for a true-false propo- 
sition can similarly be written as e[p] = s[l — p] — s\p], 
where surprisal is s[p] = hi2[f/p]. 

All of these applications rely on the fact that proba- 
bilities between and 1 can be written as multiplicities 
w p = 1/p between 1 and +oo or as surprisals between 
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FIG. 2: Choice multiplicity =>• gas law, equipartition & mass action. 



and +oo using information units determined by the con- 
stant k in the expression s p = k\n[l/p]. This surprisal 
multiplicity <=> probability inter-conversion is summa- 
rized by: 



set p, can be written: 



< oo 



(2) 



< s p = k In [wp] = k In 
where of course the units are bits if k = 1/ ln[2] 

B. average surprisals 



The treatments of the ideal gas law, equipartition, 
mass action, and the laws of thermodynamics in the pre- 
vious section connect to this tradition by defining uncer- 
tainty or entropy S as an average surprisal e.g. in J/K 
between and +co, Boltzmann's multiplicity W between 
1 and +00 as e s ^ k where k is Boltzmann's constant, and 
1/W as a reciprocal- multiplicity between and 1. Their 
relevance to the thermal side of physics education has 
been discussed above. 

More generally the interconversion for the average sur- 
prisal, uncertainty, or entropy associated with predicted 
probability-set q, as measured by operating probability- 



< S p / p < S q /p 



kin [W t 



q/pj 



N 

k ^ Pi In 

i=l 



< CO. 



(3) 

Thus S^/p e.g. for an observation is in bits the average- 
surprisal if the expected model q differs from the 
operating-model p. 

Although written for a discrete probability-set, the ex- 
pression is naturally adapted to continuous as well as 
quantum mechanical probability-sets^^. In this con- 
text natural as distinct from historical units for temper- 
ature become energy per unit information, and for heat 
capacity become bits2£. 

Note that the upper limit on S p / p is ln2[iV]. Also the 
fact that Sq/p < S p / p , i.e. that measurements us- 
ing the wrong model q are always likely to be 
more surprised by observational data than those 
using the operating-model p, underlies maximum- 
likelyhood curve-fitting and Bayesian model-selection as 
well as the positivity of the correlation and thermody- 
namic availability measures discussed below. 

Thus in this two-distribution case, 1 < W q / p < +co 
is an effective choice-multiplicity for expected prob- 
ability set q in the face of operating-probability set p. In 
general Wp/ P < Wq/ p . For the uniform N-probability set 
Ui = 1/N for i running from 1 to N, we can also say that 
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FIG. 3: Cross-disciplinary applications for log-probability measures in statistical inference. 



Wp/p < W u/P ^ N ~ W u/U < W p/U . 



C. net surpr isals 

The tracking of subsystem correlations has taken a 
back seat in traditional thermodynamic use of log- 
probability measures. This is illustrated e.g. by the 
traditional treatment of subsystem entropies as additive, 
in effect promising that correlations between e.g. be- 
tween gas atoms in two volumes separated by a barrier 
can be safely ignored. More generally, however, subsys- 
tem correlations (e.g. between a sent and a received 
message, or between traits of a parent and of a child) 
are of central importance. In fact the maximum en- 
tropy discussed above is nothing more than minimum 
KL-divergence with a uniform prior—, so that physicists 
expert in its application to analog systems can play a 
pivotal role informing students who take physics courses 
about these connections across disciplines. 

In particular the foregoing are backdrop to the 
paradigm-shift which broke out of physics into the wide 
world of statistical inference in the mid-20th century^. 
We'll touch on only three of the many areas that it's 



connecting together today, based on their relevance to 
cross-disciplinary interests of students in physics classes. 
The specific application areas are: (i) thermodynamic 
availability as in Appendix ID 11 (ii) algorithmic model 
selection as in Appendix lD 21 and (iii) the evolution of 
complexity as in Appendix lD 31 The surprisal multi- 
plicity probability interconversion for these correlation 
analyses may be written: 



< I q / P = fcln [A/q/p] = k^p l In 



< oo 



(4) 



Log-probability measures are useful for tracking 
subsystem-correlations in digital as well in analog 
complex systems. In particular tools based on 
Kullback-Leibler divergence 7 q / p > (the negative of 
Shannon- Jaynes entropy) and the matchup-multiplicity 
or choice-reduction-factor Af q / p associated with refer- 
ence probability-set q have proven useful: (i) to engi- 
neers for measuring available- work or exergy in thermo- 
dynamic systems 3 - , (ii) to communication scientists and 
geneticists for studies of: regulatory-protein binding-site 
structure^, relatedness^, network structure, & replica- 
tion fidelit y 33 ' 34 , and (iii) to behavioral ecologists want- 
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ing to select from a set of simple-models the one which is 
least surprised by experimental dat a 35 ' 36 from a complex- 
reality. 

In context of this idea-set, the logical schematic in 
Figure [3] illustrates connections that often go unmen- 
tioned between what are now-classical application areas 
in their specialized fields. It thus suggests that physi- 
cists, particularly thanks to their long experience with 
log-probability measures in analog systems, can play a 
key role in the cross-disciplinary application of informat- 
ics to complex systems. 

These multi-moment correlation-measures have 
2nd law teeth making them relevant to quantum 
computing^!, and they enable one to distinguish pair 
from higher-order correlations making them relevant to 
the exploration of order-emergence in a wide range of 
biological system o 38 ' 39 . They may be especially useful in 
addressing challenges associated with the sustainability 
of multi-layer complex systems^. 



VI. DISCUSSION 

Similar analyses might also help each of us decide 
when it is (and is not) appropriate to spend time 
in the educational arena e.g. on: (i) geometric- 
algebra approaches^— to complex numbers & cross- 
products, (ii) energy^ & least-action^ based introduc- 
tions to mechanics, (iii) vector potential introductions 
to magnetism^, (iv) explore-all-paths introductions to 
quantum mechanics^. The approach may even come in 
handy for mediating differences in research strategy as 
well, e.g. in deciding how much time to spend (in con- 
text of a particular problem) on: (a) CPT approaches 
to the application of non-Hcrmitian Hamiltonians^i, (b) 
molecule-code as distinct from kin-selection models of 
evolving eusocial or altruistic behavior—, etc. 



VII. CONCLUSIONS 

In short both quantitative and schematic considera- 
tions of the algorithmic path to key deliverables from 
your & your audience's conceptual starting point 
may help point you toward approaches that help your 
students become maximally-informed in minimum time. 
These may not lessen "the detailed work of content 
modernization"—, but they may help provide the process 
with useful direction taylored to our individual points of 
reference. What would your concept maps look like in 
this context? 
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FIG. 4: Adding w/c — 1 proper- velocity vectors. 

Bayesian informatics development course which helped to 
nail down some of the inter-connections discussed here. 



Appendix A: Proper-velocity 

Minkowski's flatspace metric-equation naturally de- 
fines the relation between traveler-time elapsed (St) and 
the distance/time between events defined with respect 
to the yardsticks (Sx) and synchronized-clocks (St) of a 
single map-frame, thus defining Lorentz factor & proper- 
velocity as alternate ways to describe rate of travel. 

Proper-velocity, referred to by Shurcliff as the 
"minimally-variant" parameter for describing position's 
rate of change, can simplify our understanding of many 
relativistic processes. We choose two ways here that are 
relevant to an introductory physics course. 

1. vector addition 

A useful mnemonic for relative motion in the Newto- 
nian world is: 

VAC = VAB + V B C (Al) 

where e.g. vab is the vector velocity of object A with 
respect to object B. Note that in general vab = —vba, 
and in sums "a common middle letter cancels out" . 

For a relation that works for uni- directional velocity- 
addition even at coordinate-speeds v near lightspeed c, 
one might use the similar relationship: 

WAC = lAC^AC = 1AB1BC ( V AB + V B c) (A2) 

where the proper- velocity w = dx/dr — p/m is map- 
distance x traveled per unit time r on traveler-clocks, 
coordinate- velocity v = dx/dt with v < c is the usual 
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FIG. 5: Kinetic energy vs. momentum plot. 

map-distance traveled per unit time t on map-clocks, and 
Lorentz- factor 7 = dt/dr = 1/ \Jl — (v/c) 2 = E/mc 2 > 
1 is the "speed of map-time per unit traveler-time" pro- 
vided that the map-frame defines simultaneity. 

Thus uni-directional coordinate-velocities add but 
Lorentz-factors multiply when forming the proper- 
velocity sum. This allows colliders (sometimes with 
Lorentz-factors well over 10 5 ) to explore much higher- 
speed collisions than would be possible with a fixed- 
target accelerator! 

In the any-speed and any-direction case this becomes: 

wac = Iacvac = (wab) c + w B c (A3) 

where C's view of the out-of- frame proper- velocity 
(wab)c is in the same direction as wab = Iabvab 
but rescaled in magnitude by a factor of (jbc + wab ■ 
»bc/( c2 (1 +7ab))) > 0, as illustrated for "unit proper 
velocities" in Fig. @] Hence the original low-speed equa- 
tion generalizes nicely when proper- velocity w = dx/dr 
is used instead of coordinate- velocity v = dx/dt. 

2. energy vs. momentum 

Vector proper-velocity w like vector momentum p = 
raw = rw-fv has no size upper-limit. Likewise for scalar 
Lorentz-factor 7, which like kinetic energy K = (7 — 
l)mc 2 has no intrinsic upper-limit. 

Hence Newton likely imagined that by choosing the ap- 
propriate mass m, objects may be found with any desired 
mix of kinetic energy K and translational momentum p. 
As shown in Figure[5j however, Minkowski's metric equa- 
tion: 

(cSt) 2 = (cSt) 2 -fr-fr (A4) 



by defining Lorentz-factor in terms of coordinate- velocity 
in effect lowers a curtain on kinetic-energy/momentum 
space by making only the lower right half of it accessible 
to moving objects. 

The log-log plot in Figure[5l which also has lines of con- 
stant mass and constant coordinate-velocity, thus pro- 
vides students with an integrative view of kinetic-energy 
and momentum space for a wide range of objects in (and 
beyond) everyday experience. Thus for example if one 
points these relationships out as early as possible in an 
intro-physics course (perhaps as early as the kinematics- 
section on relative velocities if one takes the time to 
distinguish traveler-time r from map-time t in defining 
Lorentz factor 7 = dt/dr as in the previous section), then 
one may find opportunities again-and-again to refer back 
to it as new phenomena come up in the course. 

Appendix B: Proper-acceleration 

By sticking with a single map- frame to define extended 
simultaneity, one finds that equations for accelerated 
motion also extapolate nicely from low to high speed. 
The organizing parameter for this extension is the three 
non-zero (spatial) components of an object's acceleration 
four-vector as seen from the vantage point of the object 
itself. 

This 3-vector is referred to as the object's proper- 
acceleration a. Again we discuss two uses for this quan- 
tity that are most relevant to students in an intro-physics 
course. 



1. proper or geometric? 

For intro-physics students even at low speeds one might 
point out that there are experimentally two kinds of 
acceleration: proper-accelerations associated with the 
push/pull of external forces, and geometric-accelerations 
caused by choice of a reference- frame that is not geodesic 
i.e. a local reference coordinate-system that is not "in 
free-float" . 

Typically proper- accelerations are felt through their 
points of action e.g. through forces on the bottom of 
your feet, or through interaction with electromagnetic 
fields. On the other hand, geometric-accelerations as- 
sociated with one's coordinate choice are associated with 
affine-connection forces (an extended version of the New- 
tonian concept of inertial force) that act on every ounce 
of an object's being. 

Affine-connection effects either vanish when seen from 
the vantage point of a local free-float or geodesic frame 
(an extended version of the Newtonian concept of inertial 
frame) , or give rise to non-local force effects on your mass 
distribution which cannot be made to disappear. Some 
of these are summarized in Table HI 

Although the following need not be shared, the as- 
sertion above contains the essence of general relativity's 
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TABLE I: Acceleration types and various forces. 
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equivalence principle which guarantees that Newton's 
Laws can be helpful locally in accelerated frames and 
curved space time, provided that we invoke inertial 
forces to explain the geometric-accelerations which op- 
erate in those frames. 

The mathematics of geometric accelerations comes 
from the fact that in general relativity an object's co- 
ordinate acceleration (as distinct from only its proper- 
acceleration 4-vector ^4) is equal to: 



~~dr~ 



(Bl) 



where geometric-accelerations are represented by the 
affinc-connection term T on the right hand side. These 
may be the sum of as many as sixteen separate velocity 
and position dependent terms. Coordinate acceleration 
goes to zero whenever proper-acceleration is exactly can- 
celed by that connection term, and thus when physical 
and inertial forces add to zero. 



2. accelerated roundtrips 

For unidirectional (l-(-l)D motion, the rapidity or 
hyperbolic velocity angle r\ simply connects the inter- 
changable velocity parameters Lorentz-factor 7 = dt/dr, 
proper- velocity w = dxjdr and coordinate- velocity v = 
dx/dt via: 



V 



sinh 



= tanh 



= ±cosh _1 [ 7 ] (B2) 



These parameters may then be used to express the 
proper-acceleration a experienced by an object travel- 
ing with respect to a map-frame of co-moving yardsticks 
and synchronized clocks in flat space time, in terms of its 
coordinate-acceleration a which cannot be held constant 
at high speed, as: 
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FIG. 6: 1-gee proper-acceleration roundtrips. 



1 dw 



d:V 



— — = j a, where a = — 
7 dr dt 



(B3) 



This yields three integrals of constant proper- 
accelerated motion that reduce to the familiar equations 
of constant coordinate-acceleration at low speeds: 



Aw A?/ 2 A 7 „« c At; u « c 1 A(v 2 ) 

a = —z — — c_ ; — — c "7 — = ~r~ = X — * (B4) 

At At Ax At 2 Ax 

These in turn allow one for example to write out ana- 
lytical solutions (cf. Fig. [5]) for round-trips involving con- 
stant 1 gee ~ 1.03[ly/y 2 ] accelerated/decelerated travel 
between stars. 



Appendix C: Choice multiplicities 

Senior physics courses have for already been re- 
arranged considering the fundamental role that multi- 
plicity (and its logarithm, namely entropy) play in un- 
derstanding and predicting behaviors. Although intro- 
physics courses are weaker in this context, books like 
Tom Moore's "Six Ideas"— have put choice-multiplicity 
where it belongs at the start of the thermo-chapters. 

Hence the only section in this Appendix is one for stu- 
dents with virtually no math background. The hope is 
that teachers will individually explore ways to introduce 
the connection between bits and J/K, while at the same 
time nurturing an appetite for textbook revisions that 
better communicate the relation between thermal physics 
and information theory downstream. 
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FIG. 7: Logarithmic measures of probability and odds. 



1. quantifying risk 

Surprisal in bits (defined by probability = l/2# blts ) 
might be useful to citizens in assessing risk and/or stan- 
dards of evidence (cf. Fig. 0), because of its simple, 
intuitive, and testable ability to connect even very small 
probabilities with one's experience at tossing coins. For 
example, the surprisal of dying from a smallpox vaccina- 
tion (one in a million) is about 19.9 bits (like 20 heads 
in 20 tosses) , while the surprisal of dying from smallpox 
once you have it (one in three) is only about 1.6 bits (i.e. 
more likely than 2 heads in 2 tosses). 

Thus surprisal: (i) has meaning which is easy to remind 
yourself of with a few coins in your pocket, (ii) reduces 
huge numbers to much more intuitive size, and (iii) allows 
one to combine risks " from independent events" with ad- 
dition/subtraction rather than multiplication/division. 

For instance (from the numbers suggested above) your 
chance of dying is decreased by getting the vaccination, 
as long as the surprisal of getting smallpox without the 
vaccination is less than 20 — 2 ~ 18 bits. That means that 
vaccination is your best bet (absent other information) 
if your chances of being exposed to smallpox are greater 
than those of getting 18 heads in 18 tosses (1 out of 2 18 ~ 
333,333). 

Given the large difference between something with 2 
bits of surprisal and something with 18, communications 
bandwidth might be better spent by newsmedia provid- 
ing us with numbers on observed surprisal, rather than 
by reporting only that "there's a chance" of something 
bad (or good) happening. Saying the latter treats your 
audience as consumers of spin rather than information. 

Likewise, use of surprisals in communicating and mon- 
itoring risks to medical patients could make patient de- 
cisions about actions with a small chance of dire out- 
comes as informed as possible. This could reduce the 
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FIG. 8: Available work/gram vs. ambient temperature T for 
0° and 100° water in various states. 



costs of medical malpractice in the long run by empower- 
ing patients with tools to make informed and responsible 
choices, making the need for legal redress less frequent. 

Thus the media for risk-assessing public could play a 
key role in reducing the costs of defensive medicine. Some 
might even enjoy surprisal data on the small probabilities 
associated with some gambling opportunities. After all, 
there really is more to the lottery than simply knowing 
"the size of the pot" . 



Appendix D: Matchup multiplicities 

Of the integrative concepts discussed in this paper, 
the least familiar to physicists (judging from textbooks, 
at least) may be those associated with the logarithmic 
correlation-measure often referred to as KL-divergence 
and its multiplicity: in effect a kind of normalized 
choice- reduction factor which is never less than 1. 
Hence in this Appendix we discuss matchup-multiplicity 
connections to: (i) available work, (ii) model-selection 
math important at least for physicists involved in cross- 
disciplinary work, and (iii) the evolution of complex ana- 
log as well as digital systems. 



1. work's availability 

Best-guess states (e.g. for atoms in a gas) are inferred 
by maximizing the average-surprisal S (entropy) for a 
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given set of control parameters (like pressure P or vol- 
ume V). This constrained entropy maximization, both 
classically and quantum mechanically, minimizes Gibbs 
availability in entropy units A = — khi[Z] where Z is a 
constrained multiplicity or partition function. 

When absolute temperature T is fixed, free-energy (T 
times A) is also minimized. Thus if T, V and number 
of molecules N are constant, the Helmholtz free energy 
F = U — TS (where U is energy) is minimized as a system 
"equilibrates" . If T and P are held constant (say during 
processes in your body), the Gibbs free energy G = U + 
PV — TS is minimized instead. The change in free energy 
under these conditions is a measure of available work that 
might be done in the process. Thus available work for an 
ideal gas at constant temperature T Q and pressure P Q is 
W — AG — NkT Q[V/V ] where V Q = NkT /P and by 
Gibbs inequality Q[x] = x — 1 — In [a;] > 0. 

More generally, the work available relative to some am- 
bient is obtained by multiplying ambient temperature T 
by KL-divergence or net-surprisal AI > 0, defined as the 
average value of k\n[p/p ] where p is the probability of 
a given state under ambient conditions. For instance, the 
work available in equilibrating a monatomic ideal gas to 
ambient values of V and T a is thus W = T a AI, where 
KL-divergence AI = Nk(Q[V/V ] + (3/2) 9 [T/T ]). The 
resulting contours of constant KL-divergence put lim- 
its on the conversion of hot to cold as in flame-powered 
air-conditioning, or in the unpowered device to convert 
boiling- water to ice- water (Fig. [8]). Thus KL-divergence 
(also known to engineers as available-work or exergy in 
units of kT a ) measures thermodynamic availability in 
bits. 



2. model selection math 

In spite of: (i) the central role of model-selection in all 
observational science, and (ii) the math background re- 
quired to do physics, the mathematics of model-selection 
is sometimes hardly an afterthought in physics tests and 
assessments. Introductory physics texts sometimes even 
define models as simplified versions of physical systems^, 
instead of as idea-based representations that (like the 
molecule-based code-strings of one-celled organisms) help 
us correlate our behaviors with the world around. 

Hence the background of physicists, e.g. on disser- 
tation committees of biophysics students whose project- 
literature requires a mathematical approach to model- 
selection, may lead them to think that parameter- 
estimation and model-selection are one and the same. 
In this section we follow astronomer Phil Gregory's dis- 
cussion of Occam factors to show how the two relate, and 
point to an interesting strategy for getting quick answers 
that so far seems to be better known in ecology than in 
physics. 

We begin with Gregory's Occam factor—, defined as 
fl M = p[D\A,M]/p[D\A,M] i.e. as the factor by which 
the likelihood of a set of data D is increased by consid- 
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FIG. 9: Probabilities of constant vs. linear fits using AIC. 



ering only the most likely set {Ak} of k = 1 to K fit- 
parameter values {Ak} associated with a given model M. 
One might then use this to predict the surprisal of data 
generated by reality's unknown operating probability q, 



namely S p / q = (In 



) given that model: 
q 



S, 



p/q 



In 



1 



[p[D\A,M] 



= In 
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p[D\A,M] 



In 



1 



(Dl) 

Although we can't calculate KL-divergence J p / q directly 
because we don't know S q / q , the argument is that the 
best model M available (in the absence of inside infor- 
mation about reality q) is the one that is least surprised 
by available data D i.e. for which S p / q is least. 

Note that the first term in the minimized quantity on 
the right-hand-side of Equation ID1I is just the negative 
log-likelyhood of the fit i.e. the quantity traditionally 
minimized when choosing the optimum set of param- 
eters for a given model. For example, that first term 
when least-squares fitting N data points with normally- 
distributed errors having a mean /i and standard devi- 
ation a is a non-varying constant plus a second con- 
stant times the average square of data deviations from 
the mean i.e. the model's variance or mean-square-error. 

When choosing one model over another, however, the 
second "Occam-factor" term must also be considered in 
the analysis. Gregory provides a lovely general expres- 
sion for Occam factors in the linear least-squares case, 
in terms of prior probabilities for the fit parameters and 
the parameter-function covariance matrix. On the other 
hand statisticians have set-up simpler rules-of-thumb for 
this Occam factor, also grounded in Bayesian inference 
and the connection described here to KL-divergence. 
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Akaike information criterion (AIC) may be the best- 
documente d 35 ' 36 of these rules of thumb. Although it tra- 
ditionally uses an information-units constant of 2 instead 
of 1 for nats and 1/ ln[2] for bits, AIC given enough data 
basically replaces the Occam-factor term on the right- 
hand-side of Equation [Dl] with model-penalty of K nats, 
where K is the number of fit-parameters in the model. 
Use of AIC to decide if two sets of data points are bet- 
ter modeled with a sloped-line (K=2), or with a constant 
(K=l), is illustrated in Figure |U 



3. layered correlations 

Does it seem strange that the second law of thermo- 
dynamics is expressed in terms of some observer's knowl- 
edge about a system's state? This strangeness is exacer- 
bated by the fact that many of the analog systems that 
thermodynamics was traditionally applied to (like macro- 
scopic amounts of gas and condensed matter) allow one 
to treat entropy as an extensive state-variable, by ne- 
glecting correlations between subsystems. 

More generally of course Sa + 5*s — Sab > with the 
difference being none other than the mutual information, 
i.e. the KL-divergence I q / p between the operating set of 
probabilities p and the product of marginal probabilities 
q taken for the two systems separately^. With nano- 
sized systems, as well as with systems involving digital 
codes (built from either ideas or molecules), these corre- 



lations can no longer be ignored. 

Of course these correlations are also not being ignored 
by the second law, which in effect treats knowledge (re- 
siding outside a system) concerning a system's detailed 
state as a subsystem correlation. This correlation can be 
lost by accidental alterations to either the system or to 
the knowledge-repository, so that thermodynamic infor- 
mation (like any correlation between sub-systems) is a 
delocalized physical quantity. 

Such sub-system correlations (including exergy over 
kT Q as discussed in Appendix ID II) naturally evolve over 
time when a stream of ordered-energy (e.g. in the form of 
2 eV solar photons) is available. In some cases, layered- 
correlations, that look in and out from a hierarchy of 
sub-system boundaries, manage to evolve as well 5 ^. 

Physical layer-boundaries emerge with the institution- 
alization of broken symmetries, as in the case of sim- 
pler physical system s 51 ' 52 , whether that boundary is 
a starlight-illuminated planetary surface, the bilayer- 
membrane which separates the inside from the outside 
of a living cell, or the edge of a gene-pool defined by the 
emergence of behaviors that treat family-offspring differ- 
ent from other offspring within species. Although the 
mix of boundary types on a given level is perhaps be- 
wildering, the number of boundary-layers in any given 
hierarchy is reasonably small and well-defined. 

KL-divergence also offers a mathematical template 
with which to inventory such layered correlations. The 
mathematics of pair, triplet, quadrulplet etc. subsystem- 
correlations has for example been worked out e.g. in the 
context of studies on neural networks 38 . Pair correla- 
tions go a long way to explain behaviors that look out- 
ward from a given boundary 5 - 3 -, but of course post-pair 
correlations may be crucial for maintaining sub-system 
correlations that look inward from the next boundary 
up. 

What an understanding of the physical context tells us, 
however, is that we have not yet addressed the challenge 
of modeling a hierarchy of correlation-layers e.g. that 
look in & out from physical-boundaries like cell walls, 
tissue boundaries, metazoan skins, and molecule & idea 
code-pool edges. Of course the physical context described 
here also suggests ways to approach that analytical chal- 
lenge. 

For example task (or niche-network) layer-multiplicity 
has been suggested as a choice-multiplicity estimate of 
the effective number of correlation-layers in any com- 
munity of social metazoans— . The estimate is intended 
to model (in some monotone fashion) the physical (per 
capita) matchup-multiplicity maximum for that commu- 
nity, whose precise quantitation would require informa- 
tion that we don't have. Models like the one in Fig. ITOl 
chosen to be: (i) understandable, (ii) amenable to non- 
invasive monitoring, and where possible (iii) correlation- 
nurturing, might in the days ahead allow physicists to 
help out in new ways. 
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